-
Notifications
You must be signed in to change notification settings - Fork 5
Conversation
@pavel-kirienko we can use this PR to discuss the design goals. I've added the ones we mentioned in the e-mail thread, but they're of course up for discussion from you or anyone. |
Thanks, @kjetilkjeka, the list makes sense. I suggest being more specific about feature-completeness. It would be great to eventually support the following features so that users could switch from C++ & libuavcan to Rust & uavcan.rs without losing anything in the way of convenience:
Making this list explicit in the design document should emphasize that we're aiming to create a full-featured library that would be at least on par with Libuavcan so that users won't be tempted to stick to C++ purely because of the lack of the features they need in Rust. Concerning the implementation details, I think I should desist from commenting that at this point because I don't know Rust yet as well as I would like to. I tried to outline my vision of memory allocation in our mailing list exchange, it probably wouldn't make sense to copy paste that here. |
What do you mean by this? Do these services need any more support than services in general?
When talking about Uavcan services it's very natural for me to think about RPC, would a generic implementation like the following provide what you need?
My first thought is that this should be outside the "main crate" but the library as a whole should provide this. I will add this to the document in some form.
I will add this as well. It might not be the most complex task, but we need to be smart about how to interface the driver to make porting as simple as it can be.
Yes we need this. However, this will either require knowledge of dsdl, or hard coding (at least) NodeStatus frame into the application.
Agree
Agree, goals before implementation. I've used the rpc_call as an example and discussed what dependencies we might have, I've tried to keep it as minimal while illustrating points. I will continue to do that. |
Added some specification regarding feature completeness
These are regular services that can be easily implemented by the user in the application. However, the library has all of the context that is needed to implement these services within the library. This is not required, but makes the development of UAVCAN applications easier, because the user won't have to concern themselves with coding trivial stuff. Overall it reduces the amount of boilerplate code in the application and encourages users to implement richer UAVCAN interfaces.
Yes, UAVCAN services are essentially RPC. The implementation you've shown would be fine if it were non-blocking, otherwise, we'd be assuming stuff about the application, would you agree? Consider the way services are implemented in Libuavcan: http://uavcan.org/Implementations/Libuavcan/Tutorials/4._Services/. I think that is a quite successful and convenient API that scales well. It is asynchronous, but in the Linux platform driver, we have (ugly) wrappers that make it synchronous: https://github.com/UAVCAN/libuavcan/blob/master/libuavcan_drivers/linux/include/uavcan_linux/helpers.hpp#L80-L99. What do you think about that API?
You are right. I was thinking in a wrong mindset here. The crates ecosystem enables a much better packaging paradigm than what we had to use with C/C++. I should get used to that.
Knowledge of DSDL is going to be required for other high-level features as well (e.g. time sync and dynamic allocation), I think it's mostly unavoidable. Are you concerned about unnecessary dependencies on the DSDL layer? |
I guess you're referring especially to the client implementation. I generally do like the API, the nice features we should preserve in the rust implementation are the following:
Where I see a potential for improvement:
For rpc, I would say the most common use case is synchronous calls. From the top of my head i would say we should provide some blocking
If a feature is dependant on dsdl I feel like it got a lot of similarities to application level code.
I believe that most of this code is better of existing in its own crate (an example is the node ID allocator). I believe that NodeStatus might be an exception since it's a special kind of message (the only one that an Uavcan node is required to support). However, i have not thought this through. Maybe someone can come up with an example where a feature is dependant on dsdl and it's a good idea to have insight into internals? Maybe we're a bit deep into implementation now. But I'm more than willing to believe that it's useful in regards to stating the design goals in a precise fashion. I will add something like.
I will also split out what I see as "close to app" functions to a point of its own. I've chosen the word "high-level" for now, but i don't like it. Hope you have a better suggestion? See what you think of the changes. We will of course reiterate on top of them. |
- Divided between core and high level functionality. - Added rpc like service frame as a goal.
Also, it may be useful to be able to refuse to provide any response. The current implementation of the Raft cluster makes use of this feature when it needs to make the invoking node believe that the local Raft node is offline.
Perhaps. Would
All good points, I agree. Some of the aspects of the current API were formed while the specification was not yet quite settled so that it may be a bit suboptimal in some ways. Apropos RPC. Unless I'm misunderstanding your intentions, invoking any library/application features from interrupt context sounds like a great way to get serious problems. I strongly suggest to maintain strict isolation between the driver, where IRQ can be processed, and the library, which should make no assumptions about the way the application prefers to handle hardware events. Consider Libuavcan, for example. There is one Big Dispatch Loop (I felt like capitalizing because of how big it is) which the control flow ends up in when the application invokes My experience with reactive systems suggests that it is a classic event loop-based approach. It is used in libevent, roscpp, Python's asyncore, and tons of other software, so it seems like a solid and well-explored solution. Where your approach differs is that you are proposing to spawn threads on std-capable targets, and use IRQs on embedded targets. I can see great value in the thread-based approach, but not in the IRQ based one because that, as I see it now, would complicate the driver/library/application interfaces significantly, increase the interrupt handling workload unnecessarily, and make high-level event handling in the application code more complex and dangerous. I propose that interrupts should be kept in the driver, and callbacks should be invoked from a single thread. Moreover, if the above suggestion were adopted, it would make sense to employ it for all targets, even those that are capable of using std::thread, for reasons of consistency and simplicity, would you agree? The application would be able to detach event processing from the main library's thread into a separate thread on its own; the library could even provide some crutches that would make it more convenient. Does that make sense? I have no objection to futures and multicasts. Except I would probably add that having multiple concurrent pending requests increases the worst case memory usage because the library might be required to process more incoming transfers at once when the responses arrive. It doesn't mean that the feature shouldn't be implemented, I'm just pointing out that it may have non-obvious side effects. In some parts of Libuavcan we go at some lengths to ensure that we don't accidentally invoke too many requests at once in order to conserve memory (e.g. see the class
Agree.
High-level or application-level sounds fine to me. |
Yes, you're of course perfectly correct. We shouldn't expose the IRQ just by giving a callback (or in any other way). That was me writing more than thinking.
You're still spot on.
Perhaps having an event loop as a core is the sensible solution, it is indeed both solid and well explored. What I'm currently exploring in my head is: Could it make sense to have a core that does not require the event loop. In my head the core functionality is translating a single Uavcan frame into (possibly) multiple can frames (or the other way) and ordering them in a priority queue (probably eagerly, but perhaps alternatively lazily?). This only works if we can expect the driver to pop from the priority queue when it's ready for a new transmission, and push to a buffer after reception (this should be within scope for IRQ context but there could exist a threaded variant as well for when std exists). By only using this core you get things like:
But you won't get things like:
Using this core will be the equivalent of using libcanard. I suspect it would be usable even on 8-bit MCUs such as the atmega64m1. The next step will then be to provide infrastructure on top of this core. This can either be done as a threaded variant or a "polled variant". Maybe the threaded variant is a std::thread running the event loop, or maybe not. Both of these variants will provide a full featured library.
I might be totally wrong here, but I'm not convinced yet that forcing the event loop is the simplest solution. My motivation for thinking about the alternative approach is:
I also believe that in this kind of implementation the differances would be a small enough part of the implementation to not hurt testing significantly.
I might prefer application-level over high-level actually |
I got so sidetracked talking about the vices of delivering IRQ into the library, I ended up almost contradicting my earlier propositions. Indeed, as I said on the mailing list, an event loop is not necessarily the best approach. I used it as a showcase against IRQ, but seeing now that I just misinterpreted your earlier message, the whole argument becomes irrelevant. Let me quote myself from the mailing list thread:
The above is just my vision how things could be efficiently handled without any event loop at all. This is very similar to how things work in Libcanard, by the way. We could invoke callbacks from this "designated dispatching method", too, and it would hardly cost us anything. I certainly support your proposition to extract the barest-bones functionality from the library and turn it into a complete standalone module, and then provide higher-level modules that can be stacked together until the level of abstraction provided by Libuavcan is reached. If it were a design pattern, I would call it a telescoping package (nudge to #5). If I were you, I wouldn't expend any effort to optimize the solution for 8-bit stuff, but I'm not against it either. |
The discussion we're having is great! Let's take a step back and see how we can apply it to the design goals before we continue? Based on the current discussion I propose to add something like the following.
I've not mentioned 8-bit anywhere as supporting these shouldn't be a core goal. But supporting these MCUs should be offered some effort due to the goal regarding "usefulness in resource constrained apps". I believe, at least at this point, it makes more sense to provide useful features related to constrained resource apps in this library. What was the reason for splitting it in libuavcan and libcanard? |
The list you provided looks well, and I don't have anything to add at the moment. Should we consider the list of the core design goals settled then?
Libuavcan was the very first implementation of the protocol, and when I started working on it, the specification of the protocol did not yet exist, I was experimenting. The primary goal was to design an easy to use library, in C++, that would be as feature complete as possible while being compatible with embedded applications. I did not task myself with supporting low-end systems at all. Some time later when people started to pick up the standard, it turned out that many of them wanted to run UAVCAN on small MCUs, and even more people were keen not to touch C++ for some reason (weird). So at this point, the need for a very compact library written in C became apparent. My point is that the primary reason for having two libraries rather than one is historical. Besides, the languages are different, too. C++ is better suited for complex systems, hence Libuavcan is written in it, yet people who work with low-end stuff want C. |
Specified how "assume as little as possible" should be interpreted.
Changed phrasing from "high-level" to "application-level" functionality
There is one last thing I would like to specify. It's easy to say "be robust and safe" but what does that mean? I need to think more about this one, but here are some random thoughts:
|
Correct me if I'm wrong, but "robust/reliable code" is that which behaves strictly according to the specification, with no unintended behavior or side effects. "Safety critical" code, for example, is just the code with exceptionally high guarantees that it adheres to its specification/documentation. If the above holds, what we need is to provide a clear specification for the library (at least in the form of descriptive documentation comments provided for each API entity, but preferably also have a design document that we're kind of working on right now) and a set of unit tests that would assure that the library behaves as specified.
This is an excellent question. Libuavcan handles this for the TX buffer by replacing lower-priority frames that exist in the buffer in favor of new higher-priority frames. Additionally, it assigns higher value to service frames rather than message frames, because the messaging semantics provides for self-recovery from loss of messages (because most message transmissions are cyclic, so if you missed one, you can always expect to get the next one); whereas the service call semantic is not self-recovering (if you lost a service call or response, you're out of luck; and if the service interface is not idempotent, you're in for a lot of trouble). I named this logic QoS, like in LAN/Internet networking. It is not a part of the UAVCAN specification though. Libcanard, being such a naive implementation, does none of that. Concerning the RX queue, neither Libuavcan nor Libcanard handle that directly, because the RX queue is situated in the driver, so the driver is responsible for handling things properly.
I'm not sure I understand what you mean here, could you clarify, please? Are you talking about pool allocation? |
The meaning I give to the word robustness (related to software) is more related to what happens when faced with something outside the specs. I would think of things as, error recovery, error isolation, and failing in predictable ways. I think this is a common way to use this word? (at least wikipedia somewhat agrees with me) I can't say in a precise way how I would define reliability, but I feel like it's closer to predictability (which again requires the SW to adhere to the specs) than just adhering to specs. I would agree that unit testing is a way to achieve reliability (as opposed to having limited use to assure robustness). Doing our best to adhere to the specs is something I take as an underlying assumption. But no spec is complete, and trying to make it complete is mostly not very useful either (maybe with an exception of formal verification). A just as important question is: What should software do in the case of undefined behavior? Refuse to compile? Crash? Attempt to provide degraded service? The following is what I feel is the most important thing to bring up at this point:
I completely understand your logic. And would say this approach is perfect for >99% of real systems. What is the option for when dropping frames is unacceptable?
Sounds like a good idea, my first thought is that we should probably do the same for this library. But we will have a thorough discussion about buffers and pool allocating some time in the close future, let's save this for then.
I'm talking about if the can drivers (erronously?) accepts frames that never will be processed by library. I guess the solution is to check and throw away frames that the node doesn't subscribe to. It was primarily used as an example in the robust/reliable discussion and is not necesarily a hard to solve problem. |
In the mission-critical setting we should strive to minimize the number of runtime failures, which suggests the following error handling strategies, from most preferred to least preferred:
The library is not in the right context to provide such guarantees. The application is responsible to use the library in a way that would not cause failures that the library itself can't control. Talking about the pool overflow specifically, consider the memory pool sizing guidelines we provide for Libuavcan: http://uavcan.org/Implementations/Libuavcan/Tutorials/2._Node_initialization_and_startup/#memory-management The library should guarantee that if the memory pool is sized correctly, it will not overflow by design. By the way, not sure if we discussed this earlier: Libuavcan and Libcanard allocate the TX queue in the memory pool, too, as well as RX transfers, hence why its size is critical to set up correctly.
The CAN driver cannot know what frames can be processed by the library. Its task is to receive whatever is available on the bus and let the library process that, so the library is supposed to discard the frames it doesn't need. Note that CAN acceptance filters do not relieve the library from unwanted frames completely, either; they merely reduce the probability of accepting unwanted frames. Here's a relevant piece of Libuavcan: https://github.com/UAVCAN/libuavcan/blob/f45be6fe58054c1de5cc4ae4a856d8e0ad1e4600/libuavcan/src/transport/uc_dispatcher.cpp#L129-L141 |
I suggest the following:
and (unrelated)
what do you think @pavel-kirienko ? After these additions, I'm happy :) |
That's a fine set of goals. Although "more than reasonable coverage" does sound a bit vague - is that >100% coverage? :) |
I would love to have more than 100% coverage. Testing code that will be implemented in the future sounds great! Are you implying that 100% code coverage should be a goal? Realistically, is code-coverage a very useful metric at all? I would in many cases prefer to increase the functional coverage in "the tricky parts". I'm not really sure how to clearly express what kind of test coverage we should aim for, do you have any suggestions? Perhaps the solution is to be even more vague, (write tests where it's reasonable)? |
The statement "more than reasonable" doesn't make much sense, because doing anything else than reasonable is non-reasonable by definition. :D We're just nitpicking at this point though. I suggest replacing "The tests should provide a more than reasonable coverage" with "The tests should provide a reasonable coverage", and we should be good to go. |
Added specification of the two last points as well
This is the PR tracking the design of the design goals.
You may read the current version: here